Caio Raphael

Shader code converts high-dynamic-range (HDR) linear color values (often stored in floating formats like R16G16B16A16_SFLOAT ) into display-referred low-dynamic-range (LDR) values (sRGB or the swapchain format).
Operations include exposure, clamping, tone curve (Reinhard, ACES, filmic), and gamma or sRGB conversion.
Each monitor manufacturer does this differently; it's not standardized .
Inputs:
- HDR color (linear), optionally exposure/exposure texture, bloom, eye adaptation.
Steps (example minimal):
1. Multiply by exposure.
2. Apply curve (e.g. Reinhard: c/(1+c) , or ACES approximation).
3. Convert to sRGB/gamma ( pow(color, 1.0/2.2) ) or use proper sRGB conversion.
4. Output vec4 clamped to [0,1] into swapchain format (e.g. FORMAT_B8G8R8A8_UNORM ).

Drawing to a High Precision Image ( `R16G16B16A16_SFLOAT` )

Rendering into an R16G16B16A16_SFLOAT (FP16) image provides:
- Higher dynamic range and precision (light accumulation > 1.0, less banding, better tone mapping).
- Freedom to tone-map and convert later.
This is the engine-side HDR pipeline .
Described technique .
- From "New draw loop" until the end.
Rendering into a separate high-precision offscreen target and then copying/blitting/tonemapping into the swapchain is the standard approach when you need arbitrary internal resolution, higher precision, HDR processing, or when the swapchain does not expose desired formats/usages. The trade-off is the extra memory and an explicit copy/blit or import step; the benefit is control over precision and size. The Vulkan command vkCmdBlitImage / transfer usage or a shader-based blit/resolve are the usual mechanisms to move from the internal target to the presentable image.
The image we will be using is going to be in the RGBA 16-bit float format.
- R16G16B16A16_SFLOAT is a common intermediate HDR format (16-bit float per channel). It increases memory and bandwidth (roughly 2× vs 8-bit RGBA) and may affect GPU/VRAM usage and upload/download costs; it also reduces quantization/banding and supports HDR/light-accumulation workflows without clamping at 1.0. The choice is an explicit trade-off: more precision (and headroom for lighting) vs more memory/bandwidth. The format is widely supported for offscreen images but may not be available as a swapchain format on all platforms, which reinforces the decision to render offscreen then convert/tonemap for presentation.
- This is slightly overkill, but will provide us with a lot of extra pixel precision that will come in handy when doing lighting calculations and better rendering.
It's possible to apply low-latency techniques where we could be rendering into a different image from the swapchain image, and then directly push that image to the swapchain with very low latency.
- Techniques like NVIDIA's "Latency Markers" / Reflex or AMD's Anti-Lag rely on starting rendering work as early as possible, often before the presentation engine signals readiness for the next frame via vkAcquireNextImageKHR (Vulkan) or AcquireNextFrame (DXGI). This necessitates rendering into a separate, persistently available image. The swapchain image index is only provided at acquisition time, making pre-rendering impossible with direct swapchain targets. Documentation for these low-latency SDKs implicitly requires separate render targets.
Choosing the image tiling:
- We can then copy that image into the swapchain image and present it to the screen.
- VkCmdCopyImage
  - Is faster, but its much more restricted, for example the resolution on both images must match.
- VkCmdBlitImage
  - Lets you copy images of different formats and different sizes into one another.
  - You have a source rectangle and a target rectangle, and the system copies it into its position.

New code for transitioning :

_drawExtent.width = _drawImage.imageExtent.width;
_drawExtent.height = _drawImage.imageExtent.height;

CHECK(vkBeginCommandBuffer(cmd, &cmdBeginInfo)); 

// transition our main draw image into general layout so we can write into it
// we will overwrite it all so we dont care about what was the older layout
vkutil::transition_image(cmd, _drawImage.image, IMAGE_LAYOUT_UNDEFINED, IMAGE_LAYOUT_GENERAL);

draw_background(cmd);

//transition the draw image and the swapchain image into their correct transfer layouts
vkutil::transition_image(cmd, _drawImage.image, IMAGE_LAYOUT_GENERAL, IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL);
vkutil::transition_image(cmd, _swapchainImages[swapchainImageIndex], IMAGE_LAYOUT_UNDEFINED, IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL);

// execute a copy from the draw image into the swapchain
vkutil::copy_image_to_image(cmd, _drawImage.image, _swapchainImages[swapchainImageIndex], _drawExtent, _swapchainExtent);

// set swapchain image layout to Present so we can show it on the screen
vkutil::transition_image(cmd, _swapchainImages[swapchainImageIndex], IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, IMAGE_LAYOUT_PRESENT_SRC_KHR);

//finalize the command buffer (we can no longer add commands, but it can now be executed)
CHECK(vkEndCommandBuffer(cmd));

The main difference we have in the render loop is that we no longer do the clear on the swapchain image. Instead, we do it on the _drawImage.image .
Once we have cleared the image, we transition both the swapchain and the draw image into their layouts for transfer, and we execute the copy command. Once we are done with the copy command, we transition the swapchain image into present layout for display. As we are always drawing on the same image, our draw_image does not need to access swapchain index, it just clears the draw image. We are also writing the _drawExtent that we will use for our draw region.

Etc

But this image still has to be copied/tonemapped into the swapchain format , which is typically limited to 8-bit UNORM unless the OS/driver supports HDR swapchain formats.
To actually output HDR to the screen, all of the following conditions must be met:

Swapchain format must support HDR bit depth .
- Example formats: FORMAT_A2B10G10R10_UNORM_PACK32 , FORMAT_R16G16B16A16_SFLOAT , or platform-specific HDR surface formats.
- You query available swapchain formats via vkGetPhysicalDeviceSurfaceFormatsKHR .
- If only 8-bit formats are exposed, you cannot present HDR directly.
Swapchain color space must be HDR-capable
- Vulkan allows specifying a VkColorSpaceKHR (e.g., COLOR_SPACE_HDR10_ST2084_EXT , COLOR_SPACE_HDR10_HLG_EXT ).
- These correspond to HDR transfer functions (PQ/HLG).
- If the driver/surface does not expose them, the system compositor won’t accept HDR content.
OS and display pipeline must be HDR-enabled
- Windows: HDR toggle must be enabled in system settings, compositor configured for HDR10.
- Linux/Wayland: requires HDR support in compositor + driver (still emerging).
- Android: requires AHardwareBuffer / SurfaceView with HDR formats.
- macOS: Metal swapchains expose extended sRGB/PQ output modes.
- (Platform docs confirm HDR availability is compositor-driven).
Application side tone mapping & gamut mapping
- Even if swapchain supports HDR, you generally still render into FP16, then apply:
  - Tone mapping (map wide dynamic range → HDR10/HLG range).
  - Color gamut conversion (usually Rec.709 → Rec.2020 for HDR10).
- Only then write into the HDR swapchain image.

Drawing to a High Precision Image ( R16G16B16A16_SFLOAT )

Etc

Drawing to a High Precision Image ( `R16G16B16A16_SFLOAT` )